AITopics | nonzero component

A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.

perceptron, storage capacity, variable selection, (14 more...)

arXiv.org Machine Learning

2512.01861

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.83)

Add feedback

Reviews: Sparse PCA from Sparse Linear Regression

Neural Information Processing SystemsOct-7-2024, 14:03:17 GMT

The paper proposes an approach to reduce solving a special sparse PCA to a sparse linear regression (SLR) problem (treated as a black-box solution). It uses the spiked covariance model [17] and assumes that the number of nonzero components of the direction (u) is known, plus some technical conditions such as a restricted eigenvalue property. The authors propose algorithms for both hypothesis testing and support recovery, as well as provide theoretical performance guarantees for them. Finally, the paper argues that the approach is robust to rescaling and presents some numerical experiments comparing two variants of the method (based on SLR methods FoBa and LASSO) with two alternatives (diagonal thresholding and covariance thresholding). Strengths: - The addressed problem (sparse PCA) is interesting and important.

assumption, linear regression, sparse linear regression, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Reduced-Space Iteratively Reweighted Second-Order Methods for Nonconvex Sparse Regularization

Wang, Hao, Yang, Xiangyu, Zhu, Yichen

arXiv.org Artificial IntelligenceAug-17-2024

This paper explores a specific type of nonconvex sparsity-promoting regularization problems, namely those involving $\ell_p$-norm regularization, in conjunction with a twice continuously differentiable loss function. We propose a novel second-order algorithm designed to effectively address this class of challenging nonconvex and nonsmooth problems, showcasing several innovative features: (i) The use of an alternating strategy to solve a reweighted $\ell_1$ regularized subproblem and the subspace approximate Newton step. (ii) The reweighted $\ell_1$ regularized subproblem relies on a convex approximation to the nonconvex regularization term, enabling a closed-form solution characterized by the soft-thresholding operator. This feature allows our method to be applied to various nonconvex regularization problems. (iii) Our algorithm ensures that the iterates maintain their sign values and that nonzero components are kept away from 0 for a sufficient number of iterations, eventually transitioning to a perturbed Newton method. (iv) We provide theoretical guarantees of global convergence, local superlinear convergence in the presence of the Kurdyka-\L ojasiewicz (KL) property, and local quadratic convergence when employing the exact Newton step in our algorithm. We also showcase the effectiveness of our approach through experiments on a diverse set of model prediction problems.

convergence, soirl 1, subproblem, (16 more...)

arXiv.org Artificial Intelligence

2407.17216

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Challenge of Differentially Private Screening Rules

Khanna, Amol, Lu, Fred, Raff, Edward

arXiv.org Artificial IntelligenceMar-17-2023

Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data analysis, especially in information retrieval problems where n-grams over text with TF-IDF or Okapi feature values are a strong and easy baseline. Over the past decade, screening rules have risen in popularity as a way to reduce the runtime for producing the sparse regression weights of $L_1$ models. However, despite the increasing need of privacy-preserving models in information retrieval, to the best of our knoweledge, no differentially private screening rule exists. In this paper, we develop the first differentially private screening rule for linear and logistic regression. In doing so, we discover difficulties in the task of making a useful private screening rule due to the amount of noise added to ensure privacy. We provide theoretical arguments and experimental evidence that this difficulty arises from the screening step itself and not the private optimizer. Based on our results, we highlight that developing an effective private $L_1$ screening method is an open problem in the differential privacy literature.

artificial intelligence, machine learning, screening rule, (17 more...)

arXiv.org Artificial Intelligence

2303.10303

Country:

North America > United States > Maryland > Anne Arundel County > Annapolis (0.05)
North America > United States > Maryland > Baltimore County (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.89)

Industry: Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Asymptotic Properties for Bayesian Neural Network in Besov Space

Lee, Kyeongwon, Lee, Jaeyong

arXiv.org Artificial IntelligenceNov-25-2022

Neural networks have shown great predictive power when dealing with various unstructured data such as images and natural languages. The Bayesian neural network captures the uncertainty of prediction by putting a prior distribution for the parameter of the model and computing the posterior distribution. In this paper, we show that the Bayesian neural network using spike-and-slab prior has consistency with nearly minimax convergence rate when the true regression function is in the Besov space. Even when the smoothness of the regression function is unknown the same posterior convergence rate holds and thus the spike-and-slab prior is adaptive to the smoothness of the regression function. We also consider the shrinkage prior, which is more feasible than other priors, and show that it has the same convergence rate. In other words, we propose a practical Bayesian neural network with guaranteed asymptotic properties.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2206.00241

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Add feedback

Does SLOPE outperform bridge regression?

Wang, Shuaiwen, Weng, Haolei, Maleki, Arian

arXiv.org Machine LearningSep-20-2019

A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator.

estimator, slope estimator, xnull 2 2, (13 more...)

arXiv.org Machine Learning

1909.09345

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

Murata, Tomoya, Suzuki, Taiji

Neural Information Processing SystemsDec-31-2018

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ under a variant of restricted eigenvalue condition, and the rate has better dependency on the problem dimension than existing methods. Particularly, if the smallest magnitude of the non-zero components of the optimal solution is not too small, the rate of our proposed {\it Hybrid} algorithm can be boosted to near the minimax optimal sample complexity of {\it full information} algorithms. The core ideas are (i) efficient construction of an unbiased gradient estimator by the iterative usage of the hard thresholding operator for configuring an exploration algorithm; and (ii) an adaptive combination of the exploration and an exploitation algorithms for quickly identifying the support of the optimum and efficiently searching the optimal parameter in its support. Experimental results are presented to validate our theoretical findings and the superiority of our proposed methods.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

Murata, Tomoya, Suzuki, Taiji

arXiv.org Machine LearningSep-5-2018

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ under a variant of restricted eigenvalue condition, and the rate has better dependency on the problem dimension than existing methods. Particularly, if the smallest magnitude of the non-zero components of the optimal solution is not too small, the rate of our proposed {\it Hybrid} algorithm can be boosted to near the minimax optimal sample complexity of {\it full information} algorithms. The core ideas are (i) efficient construction of an unbiased gradient estimator by the iterative usage of the hard thresholding operator for configuring an exploration algorithm; and (ii) an adaptive combination of the exploration and an exploitation algorithms for quickly identifying the support of the optimum and efficiently searching the optimal parameter in its support. Experimental results are presented to validate our theoretical findings and the superiority of our proposed methods.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1809.01765

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Privacy Preserving Identification Using Sparse Approximation with Ambiguization

Razeghi, Behrooz, Voloshynovskiy, Slava, Kostadinov, Dimche, Taran, Olga

arXiv.org Machine LearningSep-29-2017

A. Identification and ANN Search Many modern applications such as biometrics, digital physical object security and data generated by connected objects in the IoT require privacy preserving identification of a query with respect to a given dataset. Practically, the identification problem is based on an ANN search when a list of indices corresponding to the NN items is returned. At the final refinement stage, the list can be refined in a private setting and a single index is declared as the identified one. The identification problem faces the curse of dimensionality. For this reason, the exact identification is replaced by a search of list of closest items, i.e., one tries to tradeoff the accuracy of identification by the search complexity. In recent years, many methods providing efficient ANN solutions for multi-billion entry datasets were proposed and we named some of them without pretending to be exhaustive in our overview [1]-[3]. B. Search in Privacy Preserving Settings: Main Considerations Due to the massive amount of data, modern distributed storage and computing facilities, many ANN problems are considered in a setting where the data user outsources his datasets by applying the corresponding protection measures to third parties (servers) possessing powerful storage, communications and computing facilities. The need for data protection comes from many perspectives related to the cost of data collection, data as a "product" that represents a great value in the era of machine learning, which can be used to train and prune new and existing machine learning tools. Moreover, the server might want to discover some hidden relationships in the data.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1709.10297

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Sparse recovery via Orthogonal Least-Squares under presence of Noise

Hashemi, Abolfazl, Vikalo, Haris

arXiv.org Machine LearningAug-8-2016

We consider the Orthogonal Least-Squares (OLS) algorithm for the recovery of a $m$-dimensional $k$-sparse signal from a low number of noisy linear measurements. The Exact Recovery Condition (ERC) in bounded noisy scenario is established for OLS under certain condition on nonzero elements of the signal. The new result also improves the existing guarantees for Orthogonal Matching Pursuit (OMP) algorithm. In addition, This framework is employed to provide probabilistic guarantees for the case that the coefficient matrix is drawn at random according to Gaussian or Bernoulli distribution where we exploit some concentration properties. It is shown that under certain conditions, OLS recovers the true support in $k$ iterations with high probability. This in turn demonstrates that ${\cal O}\left(k\log m\right)$ measurements is sufficient for exact recovery of sparse signals via OLS.

artificial intelligence, iteration, machine learning, (16 more...)

arXiv.org Machine Learning

1608.02554

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.34)

Technology: